Handwriting Recognition (HR) of Family History Documents using a 2-D Warping-based Word-level HR Approach
نویسندگان
چکیده
An enormous amount of handwritten information exists that is potentially very useful for family history research. However, finding information of interest is a daunting task unless the handwriting is transcribed or indexed so that it can be digitally searched. Transcription / indexing is typically done manually because automatic handwriting recognition (HR) is not yet accurate enough to provide reliable transcriptions. Since manual transcription is both costly and time consuming, improvements in HR are very desirable. In this paper, we describe a novel method of word-level HR that we recently published at the International Conference on Document Analysis and Recognition (ICDAR 2011) and discuss how it can be applied to family history document images. We use an automatic morphing algorithm to generate a 2-D geometric warp that aligns each unknown word to known training examples. Once the word strokes are aligned, a distance map is used to calculate how different the aligned (warped) word is from the training example. The label of the training example that is most similar is used as the digital transcription for the previously unknown word. Our initial results are based on two datasets, each consisting of 1,000 training words and 1,000 test words. For in-vocabulary words, we get 88.77% and 89.33% word recognition accuracy, respectively.
منابع مشابه
Many-Author Offline Handwriting Recognition Using a Warping-Based Approach
Optical Character Recognition (OCR) software allows computers to automatically convert scanned pages of typed or machine-printed text into searchable digital formats for use by humans. However, automatically transcribing or indexing materials that are handwritten (instead of machine-printed) is a much more difficult problem that is still not completely solved. Solving this problem will be of en...
متن کاملIndexing of Handwritten Historical Documents - Recent Progress
Indexing and searching collections of handwritten archival documents and manuscripts has always been a challenge because handwriting recognizers do not perform well on such noisy documents. Given a collection of documents written by a single author (or a few authors), one can apply a technique called word spotting. The approach is to cluster word images based on their visual appearance, after s...
متن کاملTranscript mapping for handwritten Arabic documents
Handwriting recognition research requires large databases of word images each of which is labeled with the word it contains. Full images scanned in, however, usually contain sentences or paragraphs of writing. The creation of labeled databases of images of isolated words is usually tedious, requiring a person to drag a rectangle around each word in the full image and type in the label. Transcri...
متن کاملIndexing and Retrieval of Degraded Handwritten Medical Forms
The tasks of indexing and retrieval are specifically challenging for the erroneous output of handwriting recognition (HR) systems. This paper proposes an approach of indexing and retrieving degraded documents with very low recognition rates. We present a modified version of the popular Vector Model in information retrieval (IR). Our model incorporates top n candidates from a HR system into the ...
متن کاملWord Image Matching Using Dynamic Time Warping
Libraries and other institutions are interested in providing access to scanned versions of their large collections of handwritten historical manuscripts on electronic media. Convenient access to a collection requires an index, which is manually created at great labour and expense. Since current handwriting recognizers do not perform well on historical documents, a technique called word spotting...
متن کامل